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TIME  SERIES  MODEL  IDENTIFICATION, 
SPECTRAL  ESTIMATION,  AND  FUNCTIONAL  INFERENCE 

Emanuel  Parzen 

Institute  of  Statistics 
Texas  A&M  University 
College  Station,  TX  77843  USA 


Statistical  spectral  analysis  has  a  literature  which 
expands  constantly.  Every  collection  of  papers  about  signal 
processing  contains  many  papers  discussing  and  extending  the 
methodology  of  spectral  analysis.  Researchers  in  every 
scholarly  discipline  attempt  to  apply  spectral  analysis  to 
problems  in  their  field.  Consequently  I  believe  that  the 
disciplines  of  time  series  analysis  and  spectral  analysis  need 
to  be  developed  in  a  general  way  that  is  capable  of  treating 
problems  in  all  the  diverse  fields  in  which  they  occur. 

-^This  survey  talk  seeks  to  emphasize  the  following  ideas : 
v-  t^.  Functional  inference  formulation  of  parameter 
estimation* 

Parameter  estimation  and  information  theoryj 

3.  Information  divergence  of  spectral  density  functions^ 
.4.  Model  identification,  prediction  theory,  and  memory^1 

5.  ARMA  model  identification1 for  short  memory  time  series 

6.  Model  identification  of  long  memory  time  series' 

7.  The  array  of  spectral  estimators^  Cx  VV  vj 

,8.  Quantile  approach  to  non-Gaussian  time  series  analysis 

y 

To  be  presented  at  Workshops  on  Signal  Processing  in  the  Ocean 
Environment,  April/May,  1982.  Research  supported  in  part  by 
the  Office  of  Naval  Research. 
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1.  Functional  Inference  Formulation  of  Parameter  Estimation 
A  general  formulation  of  statistical  theory  as  methods  of 
analysis  of  statistical  data  assumes  that  a  statistical  question 
starts  with  a  probability  model  for  the  observed  data  set,  or 
sample,  which  is  a  function  of  a  parameter  to  be  estimated; 

f (sample | parameter) 

represents  the  probability  density  function  of  the  sample  as 
a  function  of  the  parameter. 

Classical  statistical  inference  assumes  that  the  parameter 
is  a  finite  dimensional  vector  0  =  (0^,...,0^).  Functional 
inference  assumes  that  the  parameter  is  a  function,  such  as 
f(u),  0<w£  1 . 

Parameter  estimators  are  best  determined  by  one  of  two 
general  statistical  principles:  Bayes  theorem  or  minimum 
divergence . 

Bayes  theorem  assumes  a  prior  distribution  for  the 
parameter,  and  computes  the  posterior  distribution  of  the 
parameter  given  the  sample,  denoted 

f (parameter j sample) 

A  minimum  divergence  method  introduces  a  function  I(f;fQ) 

which  measures  the  divergence  between  the  true  density  f  and  a 

proposed  density  f  .  If  one  can  form  a  raw  estimator  f  of  the 

0 

/v 

probability  density  of  the  sample,  then  an  estimator  0  is 

A 

determined  by  minimizing  I(f;f0)  with  respect  to  0.  We  call  0  a 
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minimum  divergence  estimator.  When  e  is  a  finite  dimensional 

A 

vector,  0  often  exists;  we  call  f~  a  parametric-exact  estimator 
of  the  true  f.  When  e  is  a  function  we  have  two  main 
approaches  to  determine  estimators  e  which  we  call 
non-parametric  penalty  and  parametric- select . 

A  non-parametric  penalty  method  determines  0  by  minimizing 

I(f;f0)  +  X  J(0) 

where  J(0)  is  a  measure  of  the  smoothness  of  the  function 

(tending  to  °°  as  0  becomes  less  smooth)  and  X  is  a  penalty 

parameter  which  is  chosen  by  the  researcher  to  balance  the 

fidelity  measure  I(f;fQ),  assumed  to  tend  to  0  as  0  becomes 

less  smooth,  with  the  smoothness  measure  J(e). 

A  parametric- select  method  determines  0  by  approximating 

0  by  a  vector  0m  of  suitable  dimension  m,  called  the  order, 

and  setting  0  =  0m  where  ©m  minimizes  over  0m  the  divergence 

I (f ; f q  ).  The  problem  of  selecting  the  "best"  order  m  is  a 
m 

problem  of  model  identification;  the  problem  of  then  estimating 
can  often  be  treated  as  a  problem  of  classical  statistical 
estimation  of  a  finite  dimensional  vector. 

A 

When  an  estimator,  denoted  0,  is  used  as  an  estimator  of  0, 

one  has  to  take  into  account  two  kinds  of  errors,  called 

respectively  bias  and  variance.  Bias  is  a  measure  of  the 

deterministic  difference  between  0  and  0,  while  variance  is  a 

m 

A 

measure  of  the  stochastic  distance  between  0_  and  0_-  As  m 

m  m 

increases  bias  decreases  while  variance  increases.  This  is  an 
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example  of  the  fundamental  problem  of  empirical  spectral  analysis 
which  is  how  to  achieve  an  optimal  balance  between  bias  and 
variance.  [When  one  uses  autoregressive  spectral  estimation, 
this  problem  reduces  to  a  question  of  determining  the  order  m 
of  the  approximating  autoregressive  scheme.] 

2.  Parameter  Estimation  and  Information  Theory 


A  general  approach  to  determining  divergence  measures  is 
provided  by  information  theory.  Let  f(y)  and  g(y)  be  two 
probability  densities  on  the  real  line,  -co<y<<».  The  information 
divergence  of  index  a  of  a  (model)  g  from  (a  true  density)  f 
is  defined  for  a  =  1  (index  1)  by 


Ijtfig)  -  /  ("log  f(y)  dy 


and  for  a>0  (but  afl)  by 


^  )*  af(x>  dx 


Vf;s>  ‘  &  l°*  r  <§$■  >  fw 


Information  divergence  of  index  1  has  a  preferred  role  because 
it  has  an  important  decomposition 


IL(f;g)  -  H(f ; g)  -  H(f ) 


H(f ; g)  -  /"{-log  g(y)}  f(y)  dy, 


H(f)  -  H(f ; f)  -  /"{-log  f (y)}  f (y)  dy. 


defining 


We  call  H(f ; g)  the  cross-entropy  of  f  and  g,  and  H(f)  the 
entropy  of  f. 

Information  divergence  of  index  1  is  usually  referred  to 
just  as  information  divergence. 

Another  fundamental  decomposition  concerns  the  information 
I(Y[X)  about  a  continuous  random  variable  Y  in  a  continuous 
random  variable  X,  defined  by 

I(Y|X)  =  I1(fY(x;  fy) 

=  EXIl<fY|X=x;fY) 

The  entropy  of  Y  and  conditional  entropy  of  Y  given  X  are 
defined  by 

H(Y)  =  H(fy) 

H(Y1X)  -  H(£y,x)  -  ExH(£y|x,x) 

One  can  show  that 

I (Y |X)  -  H(Y)  -  H(Y|X). 

To  apply  these  concepts  to  the  problem  of  identifying  a 
probability  model  for  a  sequence  of  random  variables 
Y(l),  Y(2),  we  define  the  information  divergence  between 

the  probability  densities  for  the  infinite  sequence  by 
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Ix(f ;g) 


lim 

r£-*-oo 


3.  Information  Divergence  of  Spectral  Density  Functions 

We  next  consider  a  time  series  Y(t),  t=0,  +1,...  which  is 
a  zero  mean  Gaussian  stationary  time  series  with  covariance 
function 

R(v)  =  E[Y(t)Y(t+v)l 
correlation  function 


p(v)  -  =  Corr  [Y(t) , Y(t+v) ] 

Despite  the  possible  confusion  with  a  probability  density,  we 
use  f  to  denote  the  spectral  density  function 

f(u>)  =  l  e"2llivu  p(v),  0<W<1. 

V— —  oo 

oo 

assuming  j[  |p(v)|<®.  The  frequency  variable  w  is  usually 

-yss  —  oo 

assumed  to  vary  in  the  interval  -0 . 5<u><0 . 5 .  But  only  the 
interval  0<w<0.5  has  physical  significance.  We  prefer  the 
interval  0<u><_l  for  mathematical  reasons . 

A  theorem  of  Pinsker  (1963)  can  be  interpreted  as 
providing  a  formula  for  the  information  divergence  between  the 
probability  density  of  a  zero  mean  Gaussian  stationary  time 
series  (normalized  by  its  variance)  with  (true)  spfectral  density 
f(w),  and  the  probability  density  of  a  zero  mean  Gaussian 
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stationary  time  series  (normalized  by  its  variance)  with 
(proposed  model)  spectral  density  f0(u>): 


I1(f(o>);  fQ  0*0) 


1}  du> 


It  has  an  information  decomposition: 


I1(f(u);f0(a)))  =  H(f  (d)  ;  f  Q  (w)  )  -  H(f(u))), 
defining  cross  entropy 

H(f  (w)  ; f g  (u>) )  =  ];  f1  {log  f0(<O  +  dw 

O  0 

and  entropy 

H(f (w) )  -  \  f1  (log  f  (u>)  +  1} 
o 

From  a  time  series  sample  Y(t),  t=l,2 . T  one  can  form 

a  raw  estimator  f(w)  of  the  true  spectral  density  f(u>)  by 

f  (w)  -  l  Y(t)  e“2lTiutr  *  \  Y2  (t) 

t«l  '  t=l 

The  sample  correlation  function 

p(v)  -  Tf  Y(t)  Y(t+v)  r  l  Y2  (t) 
t»l  t-1 


is  confuted  by  (for  0<v<Q-T“M) 
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^  I  Q;1  k  ~  k 

p(v)  -  7j  l  exp  (2tt1  £  v)  f(£)  . 

w  k“0  * 

i 

Estimating  the  parameters  0  [of  a  parametric  model  f  0  (to)  3 
by  minimizing  the  information  divergence 

I(f  (u>)  ;fQ  (w)) 

or  equivalently  the  cross -entropy 
H(f (w) ; f Q (w) ) 

is  asymptotically  equivalent  to  the  method  of  maximum  likelihood. 

An  important  example  of  the  foregiong  general  considerations 
is  the  autoregressive  model  (of  order  m)  for  a  spectral  density. 
It  has  parameters  9  =  (a^,  am(l) , . . . , am(m) )  and  is  defined  by 

£m<“>  *  "ml  fhn<<=2"l">  I'2  • 

where 

gjjjCz)  -  1  +  am(l)z+.  .  .+am(m)  zm 

has  all  its  roots  in  |zj>l,  the  exterior  of  the  unit  circle 
in  the  complex  z-domain.  Then 

2H<f;f  )  -  +  i-  I1  f(«i)|gm(e2’1“)|‘  d«> 

ol  o 


j 


ik 
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A  ^  A 

is  minimized  by  cr^,  am(l)  ,  .  .  .  ,  am(m)  satisfying  the  sample 
Yule-Walker  equations.  The  autoregressive  spectral  estimator 


fm<“> 


2ir±w^ 


-2 


is  a  parametric- exact  estimator  when  the  time  series  Y(t) 
obeys  an  autoregressive  scheme  of  order  m,  and  is  a  parametrifc- 
select  estimator  when  the  autoregressive  scheme  is  adopted 
as  an  approximating  model. 

Maximum  entropy  characterization  of  AR  spectra.  The 
spectral  density  f(u>)  that  maximizes  entropy  H(f(w))  among  all 
f(w)  satisfying  the  constraints 

jle27riujf(u)  du  ,  p(j)>  j=0,+l . jin 

o 

for  specified  correlations  p(j)  is  the  autoregressive  spectral 
density  fm(w)  with  coefficients  determined  by  the  Yule-Walker 
equations.  A  "one- line”  proof  of  this  fundamental  fact,  originally 
stated  by  Burg,  is  as  follows:  from 

"(f-.y  -  \  Uog  04  +  1) 
it  follows  that 

0  1  Ix (f ; fm>  -  H(f;fm)  -  H(f ) 

-  H(fm)  -  H(f ) 

and 

H(f )  <  H(fm). 

The  foregoing  simple  proof  of  the  maximum  entropy  character 
of  autoregressive  spectral  densities  is  analogous  to  a  proof  of 
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Che  maximum  entropy  character  of  exponential  model  probability 
densities . 

4.  Model  Identification,  Prediction  Theory,  and  Memory 

Discussions  of  general  statistical  principles  are  usually 
concerned  with  the  principles  of  parameter  estimation.  The  more 
important  problem  of  model  identification  does  not  yet  receive 
the  systematic  attention  and  emphasis  merited  by  its  crucial 
importance.  I  believe  we  are  in  a  position  to  describe 
qualitatively  the  types  of  models  that  are  usually  fitted  to 
"Gaussian"  time  series  that  are  analyzed  by  methods  related  to 
spectral  analysis.  We  distinguish  4  model  types  which  we  call 

1.  No  memory  or  white  noise 

2.  Short  memory  or  stationary 

3.  Long  memory  (or  non- stationary) 

3a.  Long  memory:  transform  to  short  memory 

3b.  Long  memory:  long  memory  plus  short  memory. 

In  the  definition  and  identification  of  these  models,  we 
use  the  ideas  of  prediction  theory.  The  information  about  a 
time  series  Y(t)  at  time  t  in  the  m  most  recent  values 
Y(t-l) . Y(t-m)  is  denoted 

Im  *  I(Y'Y-1 . Y-m> 

-  I(Y(t)|Y(t-l) . Y(t-m)) 

For  a  Gaussian  stationary  time  series 

!m  -  -  7  log  0^ 
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where 

Yy’m(t)  -  E[Y(t) | Y(t-l) . Y (t-m)  ] 

Yv,m(t)  =  Y (t)  -  Yp,m  (t) 

=  E[  |  Yv,m(t)  j  2  ]  tE[  I  Y(t)  |  2  ] 

As  m  tends  to  I  tends  to 

m 

Im  =  I(Y|Y")  -  I (Y (t) i Y (t-1) _ ) 

=  -  i  log  a 4=  -  J*-  log  f(u>)  dai  *  -H(f(a))>  +  %. 

z  ^  o 

Further,  if  H(f)>-«,  then  H(fm)-*H(f)  and  I(f;fm)-*0. 

We  then  define  a  time  series  Y(t)  to  be 
no  memory:  =  1 ,  1^=0 

short  memory:  0  <  a2  <  1»  0<I  <® 

CO  00 

long  memory:  ■=  0,  1^  =  « . 

In  terms  of  the  dynamic  range  of  the  spectral  density  f(u)), 

DR<£>  -  0<“l  f<“>  *  0Z1  fM 

we  define  intuitively 

no  memory:  DR(f)  *  1 
short  memory:  1  <  DR(f)  <  » 
long  memory:  DR(f)  -  ». 
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The  models  we  build  for  a  time  series  depend  on  its  memory 
type.  A  model  corresponds  to  a  transformation  of  the  time  series 
to  a  white  noise  series.  Therefore  a  no  memory  or  white  noise 
time  series  requires  no  further  modeling,  although  one  may  be 
interested  in  determining  such  statistical  characteristics  as 
the  mean,  variance,  and  probability  distribution. 

5 .  ARMA  Model  Identification  for  Short  Memory  Time  Series 
i  A  short  memory  time  series  Y(t)  is  modeled  by  an  invertible 

|  filter  which  transforms  it  to  white  noise: 

—  e(t)  -  Yv(t) 

The  infinite  memory  prediction  errors  are  denoted 

Yv(t)  =  Y (t)  -  Yy(t)  =  ga>(L)Y(t), 

g^Cz)  =  l+a^Cl)  z+.  .  .+aoo(m)zm+.  .  .  , 

L  Y(t)  =  Y(t-l) 

We  call  Yv(t)  the  innovation  series  and  it  is  a  white  noise 
time  series  with  variance  o^R(O) .  In  general  a  short  memory 
time  series  is  modelled  by  representing  it,  or  approximating 
it,  by  an  ARMA  (p,q)  scheme: 

Y(t)  +  cip(l)  Y(t-l)+...+ap(p)  Y(t-p) 

-  £ (t)  +  Bq(l)  e(t-l)+. . .+Bq(q)e(t-q) 


< 
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where  the  polynomials 

gp (2)  -  l+ctp(l)z+.  .  .+ap(p)  zP 
hq(z)  *  l+8q(l)z+.  •  .+Bq(q)  zq 

are  chosen  so  that  all  their  roots  in  the  complex  z-plane  are 
in  the  region  {z:|z|>l}  outside  the  unit  circle.  Then  gp(z)  and 
hq(z)  are  the  transfer  functions  of  invertible  filters.  e(t) 
is  assumed  to  be  a  white  noise  time  series  which  we  identify 
with  the  innovations  e(t)  *  Yv(t); 

a2  q  »  E[ e 2 (t) ]  t  E[Y2 (t) ] 

is  an  estimator  of  aK  The  spectral  density  of  an  ARMA(p,q) 
scheme  is 

|h  (e2lTiw)  | 2 
°p»q  |  g  1 2 

The  process  of  identifying  ARMA  (p,q)  schemes  which  are 
adequate  (and  parsimonious)  approximating  models  for  a  time 
series  can  be  studied  rigorously,  and  various  at  least  semi¬ 
automatic  methods  are  available  which  are  based  on  order 
determining  schemes. 

The  conditions  under  which  the  exact  (or  true)  model  is 
an  AR(p)  or  ARMA(p,q)  can  be  stated  in  terms  of  information 
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numbers  .  Define  the  information  about  Y  in  X2  conditional  on 
by 


I(Y|X1;X1,X2< 


>  ■  H«YIX>  ‘  H<f, 


|X1,X2' 


-  HCYlXj)  -  H (Y j X^ , X2 ) . 


Then  Y  is  AR(p)  is  equivalent  to  (where  Y  denotes  the 
infinite  past  Y  , ,  Y_2,...) 

0  -  I(Y|Y.X . Y.p;Y-)  -  I.  -  Ip  ; 

Y  is  ARMA(p,q)  is  equivalent  to 

0*  I(Y|Y_1,...,Y_pI  Y^x .... *Y^qj  Y”) 

Given  a  time  series  sample,  Y(t),  t=l,2,...T,  of  length  T, 
one  can  calculate  successively  (using  Fast  Algorithms  such  as 
the  Yule-Walker  equations)  estimators 

Ip  =  -  \  log  Op,  p-1,  2,  ... 

which  can  be  regarded  as  test  statistics  for  testing  white  noise, 
or  more  precisely  AR(0)  against  AR(p).  The  work  of  Akaike 
(1974,  1977)  and  Hannan  and  Quinn  (1949)  leads  one  to  conjecture 
that  a  universal  test  for  white  noise  (whose  theory  needs 
further  study)  is  of  the  form  (for  a  suitable  choice  of  constant 
c>0,  say  c*l) 
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-  J  log  CTp  -  log  log  T  <  ip  for  p-1,  2,  .  .  . 

A  related  conjecture  is  that  optimal  orders  p  of  approximating 
autoregressive  schemes  can  be  identified  by  first  determining 
the  orders  at  which  are  attained  the  absolute  and  relative 
minima  of  order  determining  criteria  which  determine  orders  p 

.  A 

for  which  IB  -  Ip  is  not  significantly  different  from  zero. 

Akaike's  order  determining  criterion  AIC  is  defined  by 

AIC  (m)  =  log  ; 

it  seeks  to  determine  the  order  of  an  exact  autoregressive 
scheme  which  the  time  series  is  assumed  to  obey.  One  can 
raise  the  objection  against  it  that  it  does  not  consistently 
estimate  the  order,  which  is  done  by  a  criterion  due  to  Hannan 
and  Quinn  (1979) : 

AICHQ(m)  -  log  log  log  T 

Parzen  (1974),  (1977)  introduced  an  approximating  auto¬ 
regressive  order  criterion  called  CAT  (criterion  autoregressive 
transfer  function),  defined  by 

CAT (m)  -  \  (1-i)  Oj2  -  (1-  !jb  S"2 

One  chooses  the  value  of  CAT(O),  such  as 


CAT(O)  =  -(1  + 

in  order  to  accept  the  hypothesis  of  white  noise  when  it  is 
true  a  specified  percentage  (say  90%)  of  the  time.  In 
practice  CAT  and  AIC  lead  in  many  examples  to  exactly  the  same 
orders.  It  appears  reassuring  that  quite  different  conceptual 
foundations  can  lead  to  similar  conclusions  in  practice. 

An  important  application  of  fitting  an  approximating  auto¬ 
regressive  scheme  AR(p)  to  a  time  series  is  the  estimation  of 
information  numbers  which  are  used  to  determine  the  goodness  of  fit 
of  ARMA  (p , q)  schemes.  It  should  be  emphasized  that  the  ultimate 
decision  on  the  adequacy  of  a  model  should  be  based  on  a  definition 
of  "parsimony  of  a  model"  which  requires  that  the  spectral 
distribution  function  of  the  residuals  (Y | variables  in  model) v  (t) 
be  "parsimoniously"  not  significantly  different  from  white  noise. 

The  spectral  density  of  the  memory  p  prediction  errors 
Yv,p(*)  can  be  expressed  in  terms  of  gpU)>  the  autoregressive 
transfer  function  of  order  p,  by 


4  l8p(e2'1“)l!  f<“> 

CT  V 


If  the  time  series  Y(*)  is  in  fact  AR(p),  then  its  spectral 
density  equals  the  approximating  autoregressive  spectral 
density  fp(w)  -  o *  | gp(e2Triu))  |"2 

A  time  series  Y(»)  can  be  regarded  as  approximated  by  an 
AR(p)  if 


can  be  regarded  as  not  "significantly"  different  from  a  constant. 
In  this  way  a  test  of  the  hypothesis  that  a  time  series  Y(*) 
is  AR(p)  can  be  converted  to  a  test  of  the  hypothesis  that  the 
prediction  error  time  series  is  white  noise. 

Extremely  useful  diagnostics  concerning  model  identification 
and  fit  are  provided  by  spectral  distribution  functions 


F<«)  -  2  /“  f(co')  du>\  0<oi<0 . 5 
o 

6.  Model  Identification  of  Long  Memory  Time  Series 

A  time  series  is  diagnosed  as  being  long  memory  when  it 
is  not  no  memory  or  short  memory.  There  are  two  important 
models  for  a  long  memory  time  series  Y(t), 

A.  Transformable  to  a  short  memory  time  series  by  a 
non- invertible  filter. 


Y(t) 


Noninvertible 

Filter 


Y(t) 


Innovations 

Filter 


Yv(t) 


Long 

Memory 


Short 

Memory 


No 

Memory 


Yv(t) 


The  non- invertible  filter  is  chosen  to  be  a  difference 
operator  of  the  form 

Y (t)  -  G(L)  Y(t)  -  Y(t)  +  A(l)Y(t-l)+. . .+A(M)Y(t-M) . 


Then  Y(t)  has  sample  spectral  density 
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fY  (to)  -  |G(e27riuJ)  | 2  fYU) 

Finding  the  non- invertible  filter  can  be  regarded  as  an 
additive  decomposition  of  the  log  spectral  densities:  choose 
G  so  that 


log  fy(ai)  -  log  fy(u>)  +  log  |G(e2Tri“)|2 


looks  like  the  sample  spectral  density  of  a  short  memory  time 
series.  ARARMA  schemes  [Parzen  (1982)]  choose  G  by  best  lag 
non- stationary  autoregression. 

B.  Representable  as  the  sum  Y(t)  =  S(t)  +  N(t)  of  a  long 
memory  signal  plus  a  short  memory  noise. 


Y(t) 


r-  S(t)  long  memory 


L  N(t)  short  memory 


A  usual  approach  to  finding  S(t)  is  to  model  it  as  a  sum  of 
pure  harmonics : 

k 

S(t)  -  l  {A,  cos  2 7ru 1 1  +  B.  sin  2Truijt} 

J  J  J  '  J 

It  is  difficult  to  identify  how  many  terms,  and  what  frequencies 
to  include  in  S(t).  A  new  approach  could  be  based  on 
regarding  this  model  as  an  additive  decomposition  of  the  sample 
spectral  density: 
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fy(to)  W  fS<«>  • 

One  defines  S(t)  by  first  forming  fg(u>)  which  is  chosen  so 
that 

fjj(w)  “  fy(w)  -  f  g  (ill) 

looks  like  the  sample  spectral  density  of  a  short  memory  time 
series.  One  determines  a  threshold  value  C  which  is  to  be 
subtracted  from  fy(a>)  to  form  fg(u>)  by 

fs((o)  =  (fY(io)  -  C) 

One  determines  the  threshold  value  by  treating  f(k/Q)  as  a 
data  batch  to  be  studied  by  non-parametric  data  modeling 
methods  using  quantile  functions  [see  Parzen  (1979)]. 
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7.  The  Array  of  Spectral  Estimators 

Given  a  time  series  sample  (Y(t),  t=l,2,...,T)  a 
bewildering  array  of  estimated  spectral  densities  f(u>)  can  be 
formed. 

A.  Pre-processing.  To  analyze  a  time  series  sample 
Y(t),  t"l,...,T,  one  will  proceed  in  stages  which  often  involve 
the  subtraction  of  or  elimination  of  strong  effects  in  order 

to  see  more  clearly  weaker  patterns  in  the  time  series 
structure. 

The  aim  of  pre-processing  is  to  transform  Y(-)  to  a  new 
time  series  Y(-)  which  is  short  memory.  Some  basic  pre¬ 
processing  operations  are  memory  less  transformation  (such  as 
square  root  and  logarithm),  detrending,  "high  pass"  filtering, 
and  differencing.  One  usually  subtracts  out  the  sample  mean 
-  1  T 

Y  *  i  £  Y(t) ;  then  the  time  series  actually  processed  is 
1  t=l 

Y(t)  -  Y.  If  the  mean  Y  is  a  large  number,  it  should  be 
subtracted;  the  variations  in  Y(t)  are  then  the  variations  of 
Y(t)  about  its  mean.  The  sample  mean  Y  and  sample  variance 
R(0)  should  always  be  recorded. 

B.  Sample  Fourier  Transform  by  Data  Windowing,  Extending 
with  Zeroes ,  and  Fast  Fourier  Transform.  Let  Y(t)  denote  a 
pre-processed  time  series.  The  first  step  in  the  analysis 
could  be  to  compute  successive  autoregressive  schemes  using 
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operations  only  in  the  time  domain.  An  alternative  first  step 
is  the  computation  of  the  sample  Fourier  transform 

T 

iJj(u))  =  y  Y(t)  exp  (-2Triu)t) 
t=l 

at  an  qui-spaced  grid  of  frequencies  in  0<uk1,  of  the  form 
uj  =*  q,  k*0,...,Q-l.  We  call  Q  the  spectral  computation  number. 

One  should  chose  Q  >  T,  and  we  recommend  Q  2T. 

Prior  to  computing  ip  (to)  ,  one  should  extend  the  length  of 

*w  If 

the  time  series  by  addition  zeroes  to  it.  Then  ^(w),  aj= 
can  be  computed  using  the  Fast  Fourier  transform. 

If  the  time  series  may  be  long  memory  one  should  compute 
in  addition  a  sample  "data  windowed"  Fourier  transform 

T  t 

3  Y(t)W((jr)  exp  (-2irio)t) 

To  understand  the  effect  of  the  window,  one  replaces  Y(t)  by 

a  spectral  representation  Y(t)  -  exp  (2iriXt)  d<KA)  ;  then 

o 

^  T 

(|>w(w)  =  /1wT((D-A)  di|)(A)  where  wT(X)  -  J  W(^)  exp(-2iriAt)  . 

Considerations  involved  in  the  choice  of  data  windows  are 
discussed  in  Harris  (1978) . 

C.  Sample  Spectral  Density.  The  sample  spectral  density 
f(u)  is  obtained  essentially  by  squaring  and  normalizing  the 
sample  Fourier  transform; 
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f(w) 


~  k 

5  Jo  '♦$' 


w-gj,  k-0,1 . Q-l, 


D.  Nonparametric  kernel  spectral  density  estimator.  An 

A 

estimator  f(w)  of  the  spectral  density  is  called:  parametric 
when  it  corresponds  to  a  parametric  model  for  the  time  series 
(such  as  an  AR  or  ARMA  model);  non- parametric  otherwise.  A 
general  form  of  non-parametric  estimator  is  the  kernel  estimator, 


Ms  ~  /  \  ^TtitoV 


f(w>  -  l  k(J)  p(v)  e 


,  0<uxl. 


Y*  —  oo 


Two  popular  choices  of  kernel  are  the  Parzen  window  [Parzen 


(1961)] 


k(t)  *  l-6t2  +  6t 3  ,  | t |  <  0.5 


2  (1  -  Itl) 


0.5  <  Itl  <  1  , 


1  <  Itl 


and  the  spline- equivalent  window  [Parzen  (1958),  Cogbum  and 
Davis  (1974),  Wahba  (1980)] 


k(t) 


1 

i+t2r 


where  r  >  2  is  usually  chosen  to  equal  2  or  4.  The  problem  of 
determining  optimum  truncations  points  M  has  no  general  solution; 
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one  approach  is  to  choose  a  large  value  of  M  to  obtain  a 
preliminary  smoothing  of  the  sample  spectral  density. 

E.  Autoregressive  spectral  density  estimators.  The 
Yule-Walker  equations  are  solved  to  estimate  innovation  variances 

a 

a2,  to  which  are  applied  order  determining  criteria  (AIC,  CAT)  to 
determine  optimal  orders  m  and  also  to  test  for  white  noise. 

A 

The  value  of  0^  and  the  dynamic  range  of  the  autoregressive 
spectral  estimator  fm(oj)  are  used  to  determine  the  memory  type 
of  the  time  series  [Parzen  (1982)].  One  should  determine  a 
best  and  second  best  AR  order. 

F.  ARMA  spectral  density  estimations.  When  a  time  series 
is  classified  as  short  memory  the  approximating  AR  scheme  is 
used  to  form  the  MA(«>)  coefficients  which  are  used  to  form  a 
subset  regression  procedure  for  determining  the  best  fitting 
ARMA  scheme,  and  the  corresponding  ARMA  spectral  density 
estimator. 

We  do  not  believe  that  spectral  estimation  is  a  non- 
parametric  procedure  to  be  conducted  independently  of  model 
identification.  The  final  form  of  spectral  estimator  should 
be  based  on  an  identification  of  the  type  (AR,  MA,  or  ARMA)  of 
the  whitening  filter  of  a  short  memory  time  series. 

G.  Non- stationary  autoregression.  When  a  time  series 
is  classified  as  long  memory,  more  accurate  estimators  of 
autoregressive  coefficients  are  provided  by  minimizing  a 
"forward  and  backward"  least  squares  criterion 
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T  • 

l  (Y(t)  +  a  (1)  Y (t- 1)+.  .  .  a  (m)  Y(t-m))2 
t-mfl  “  m 

+  (Y(t-ra)  +  am(l)  Y (t-m+1)  +. . .+am(m)  Y(t))2, 

or  by  Burg  estimators  [for  references  to  descriptions  of  Burg’s 
algorithm,  see  Kay  and  Marple  (1981)]. 

When  several  harmonics  are  present  in  the  data,  whose 
frequencies  are  close  together,  least  squares  autoregressive 
coefficient  estimators  are  more  effective  than  Yule-Walker 
autoregressive  coefficient  estimators  in  providing  autoregressive 
spectral  estimators  which  exhibit  the  split  peaks  one  would 
like  to  see  in  the  estimated  spectral  density. 

H.  Spectral  density  estimators  based  on  inverse 
correlations  and  cepstral  correlations.  Additional  insight 
into  the  peaks  and  troughs  to  be  given  significance  in  the 
final  estimator  of  the  spectrum  of  a  short  memory  time  series 
can  be  provided  by  forming  nonparametric  kernel  estimators  of 
f  l(w)  and  log  f (w) . 

For  a  spectral  density  f(uj)  obeying  suitable  conditions, 
one  can  define  the  inverse-correlation  function  [see  Cleveland 
(1972),  Parzen  (1974),  Chatfield  (1979)] 

pi (v)  -  f1  e2irivul  f"1(o))  dw  v  f1  f"1(<1))  dw 
o  o 

and  the  cepstral-correlation  function  [see  Wahba  (1980)  for 
an  application] 
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Y(v)  =  /*  e^TTiva)  log  £(u)  du> 

o 

It  should  be  noted  that  the  inverse- correlation  function  is 
non-negative  definite.  However  the  cepstral-correlation 
function  is  not.  These  new  types  of  correlation  functions  are 
introduced  because  they  may  provide  more  parsimonious 
parametrizations  in  the  sense  that  they  decay  to  0  faster 
than  does  the  correlation  function.  Statistical  inference 
(from  a  sample)  of  the  probability  law  of  a  time  series  often 
achieves  greatest  statistical  efficiency  by  using  the  most 
parsimonious  parametrizations.  Thus  to  form  estimators  £ (u) 
of  the  spectral  density  f(oj)  from  a  raw  estimator  f(u>)» 
greater  precision  may  be  attained  by  first  forming  estimators 
{f-1(w)K  and  (log  f(a))}«  of  the  inverse  or  logarithm  of  the 
spectral  density.  Autoregressive  spectral  estimation  may  be 
regarded  as  an  approach  to  estimating  f(w)  by  first  estimating 
f~^(u)  [Durrani  &  Arslanian  (1982)]. 
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8.  Quantile  approach  to  non-Gaussian  data  analysis 

A  promising  new  approach  to  the  analysis  of  non-Gaussian 
time  series  analysis  provided  by  the  probability  density-quantile 
function  introduced  by  Parzen  (1979).  This  section  should  be 
of  theoretical  interest  to  spectral  analysis  because  of  the 
many  isomorphisms  that  exist  between  the  statistical  theory 
of  spectral  density  estimators  and  probability  density-quantile 
estimators.  It  should  be  of  interest  as  a  technique  to  answer 
statistical  questions  about  non-Gaussian  time  series  for  which 
correlation  functions  are  not  an  appropriate  tool.  Here  we 
have  only  space  to  mention  some  basic  definitions. 

The  probability  distribution  of  a  continuous  random 
variable  Yis  most  insightfully  described  not  by  its  distribution 
function  Fy(y)  =  Pr[Y<y]  and  its  probability  density  f(.y)-F'(y), 
but  by  its  quantile  function 

Qy(u)  *  Fy^ (u)  =  inf{y:FY(y)  >  u} 
and  its  density  quantile  function 

=  fY  ^Y  ^  ^  ‘ 

The  bivariate  distribution  fy(t)  Y(t+v)^l  ^2^  ^est 
described  by  the  joint  density-quantile  function 

fY(t),Y(t+v)  (QY(t)(vV'  QY(t+v)^U2^’ 
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Techniques  for  estimating  a  joint  density-quantile  function 
are  currently  under  development  by  Woodfield,  using  the  dependence 
density  d(u^,u2)  defined  for  two  random  variables  X  and  Y  by 

fj^  Qy  (u2 ^  ^  y^ui»u2^x^7^u1^Y^Y^u2^ 

Estimators  of  d^  y(u^,u2)  are  obtained  from  raw  estimators  of 

Djj  y^ui >u2^  *  FX  y^X^ul^  'Qy^u2^ 
which  is  the  joint  distribution  function  of 

U1  =  Fx(X)'  U2  =  fy<Y> 

Information  criteria  can  be  used  to  test  the  independence  of 
X  and  Y,  and  to  estimate  their  joint  density.  Note  that 
I  (Y|  X)  ~  I<fX,Y;fXfY)  =  -H(dX)Y(uru2)) 


normal  case  with  RHD-0.5 


density-quantile  for  bivariate 
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